Interpretable clustering using unsupervised binary trees
نویسندگان
چکیده
We herein introduce a new method of interpretable clustering that uses unsupervised binary trees. It is a three-stage procedure, the first stage of which entails a series of recursive binary splits to reduce the heterogeneity of the data within the new subsamples. During the second stage (pruning), consideration is given to whether adjacent nodes can be aggregated. Finally, during the third stage (joining), similar clusters are joined together, even if they do not descend from the same node originally. Consistency results are obtained, and the procedure is used on simulated and real data sets. The Matlab code for the three stages of the algorithm is provided on the Supplemental Materials.
منابع مشابه
Clustering using Unsupervised Binary Trees: CUBT
We introduce a new clustering method based on unsupervised binary trees. It is a three stages procedure, which performs on a first stage recursive binary splits reducing the heterogeneity of the data within the new subsamples. On the second stage (pruning) adjacent nodes are considered to be aggregated. Finally, on the third stage (joining) similar clusters are joined even if they do not descen...
متن کاملConcept Acquisition for Dialog Agents
Dialog agents capable of autonomously acquiring new concepts are likely to be more powerful than those relying on a fixed set of preprogrammed concepts. kx-trees provide a novel unsupervised learning method for concept acquisition. Through online, incremental, divisive, binary-tree-based clustering, it organizes raw sensory experiences into low-level concepts. Using the same mechanism, it can o...
متن کاملLearning shape categories by clustering shock trees
This paper investigates whether meaningful shape categories can be identified in an unsupervised way by clustering shocktrees. We commence by computing weighted and unweighted edit distances between shock-trees extracted from the HamiltonJacobi skeleton of 2D binary shapes. Next we use an EMlike algorithm to locate pairwise clusters in the pattern of edit-distances. We show that when the tree e...
متن کاملIndexing Images by Trees of Visual Content
Haim Schweitzer ([email protected]) The University of Texas at Dallas P.O Box 830688, Richardson, Texas 75083 Abstract An unsupervised algorithm for arranging an image database as a binary tree is described. Tree nodes are associated with image subsets, maintaining the property that the similarity among the images associated with the children of a node is higher than the similarity among the im...
متن کاملInterpretable Multiclass Models for Corporate Credit Rating Capable of Expressing Doubt
Corporate credit rating is a process to classify commercial enterprises based on their creditworthiness. Machine learning algorithms can construct classification models, but in general they do not tend to be 100% accurate. Since they can be used as decision support for experts, interpretable models are desirable. Unfortunately, interpretable models are provided by only few machine learners. Fur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Adv. Data Analysis and Classification
دوره 7 شماره
صفحات -
تاریخ انتشار 2013